Addition of the InitialDataAdder abstraction #383

robertbartel · 2023-07-10T15:01:34Z

A portion of a larger set of changes involved in work with moving to composite configuration dataset types. Breaking this off parts like this that are fairly self contained for easier review.

Additions

A new InitialDataAdder abstract type and interface, used to add some initial data to a new dataset as it is being created

Changes

Modifying the DatasetManager interface in its create function to use InitialDataAdder instead of a str for its initial_data param
Modifying the ObjectStoreManager implementation of DatasetManager to have its create function comply with interface changes and use the initial_data object as a InitialDataAdder appropriately.

Creating abstract InitialDataAdder type for use with creating datasets, and modifying Dataset.create abstract function to use this type for the 'initial_data' parameter, rather than a string that represents the location of some kind of initial data.

Updating DatasetManager implementation's create function to support usage of the InitialDataAdder abstraction for the 'initial_data' param.

Updating to depend on core 0.9.0.

aaraney · 2023-07-10T16:24:57Z

At first glance, it appears that this is adding responsibility to dataset creation that creation should not be responsible for. Do you think could handle this use case if we instead introduced a dataset transaction or workflow abstraction? For starters we could limit the functionality to only operate on a single dataset. It seems easier to model data transformation pipelines using a workflow (weaker guarantees than transaction) or transaction.

robertbartel · 2023-07-10T18:41:27Z

At first glance, it appears that this is adding responsibility to dataset creation that creation should not be responsible for.

My perspective is actually the opposite. This is creating an abstract type dedicated to obtaining and adding initial data to a dataset. Regarding whether dataset creation should be tied to adding data ...

Do you think could handle this use case if we instead introduced a dataset transaction or workflow abstraction?

A transactional approach is essentially the intent here. If/when there is some expected, initial data for a dataset, but it cannot be added, then DatasetManager.create() should (effectively) not create the dataset. For example, don't create a composite config dataset at all - even if we can/did add the realization config file to it just fine - if there is a failure copying files from the standalone BMI init config dataset we are supposed to copy from.

Of course, the manager must create that dataset to have something to add to. My solution was to ensure initial data was added inside create() by developing an abstraction somewhat similar to a Visitor, passed to and used by create(), that would encapsulate the logic for adding any initial data. That way, if there is trouble, create() can catch any exceptions and clean up the bad dataset.

This may make more sense after seeing some of the initial implementations. I just didn't include those here because they were in the dmod.dataservice package and not needed for this part.

aaraney · 2023-07-10T19:50:27Z

My perspective is actually the opposite. This is creating an abstract type dedicated to obtaining and adding initial data to a dataset. Regarding whether dataset creation should be tied to adding data ...

Right, create() is not responsible for the logic of uploading initial data, but it is responsible for making and handling the call. It is responsible for handling its own failures and now any failures when adding initial data. To be fair, I think I am overthinking and over engineering this. In an ideal world, having an separate Transaction object that handles rollback logic akin to a db seems perfect for this problem. But, given our storage layer and the problem at hand, implementing a transaction abstraction is probably overkill.

aaraney

It looks like a minor change is needed to clean up a bucket's objects before removing the bucket in the case of a failure.

aaraney · 2023-07-10T19:58:26Z

python/lib/modeldata/dmod/modeldata/data/object_store_manager.py

+        # manager and the object store itself
+        except Exception as e:
+            self.datasets.pop(name)
+            self._client.remove_bucket(name)


We will also need to remove any files that were possibly written to the bucket (i.e. if add_initial_data() successfully added a portion of its data). remove_bucket only removes empty buckets.

Fixing ObjectStoreDatasetManager create() function so that it calls the delete() function if initial data fails to be added (and thus the dataset does not need to end up created); previous logic that just removed the bucket would fail if the bucket had anything in it.

aaraney

Looks good to me! Thanks, @robertbartel!

robertbartel added 5 commits July 10, 2023 10:42

Update ObjectStoreManager for InitialDataAdder.

057807f

Updating DatasetManager implementation's create function to support usage of the InitialDataAdder abstraction for the 'initial_data' param.

Bump dmod.core version to 0.9.0.

c93220f

Update dmod.modeldata dep for dmod.core.

5ea4f92

Updating to depend on core 0.9.0.

Bump dmod.modeldata version to 0.9.2.

7eef8e9

robertbartel added enhancement New feature or request maas MaaS Workstream labels Jul 10, 2023

robertbartel requested a review from aaraney July 10, 2023 15:01

aaraney requested changes Jul 10, 2023

View reviewed changes

aaraney approved these changes Jul 10, 2023

View reviewed changes

robertbartel merged commit 16f17cd into NOAA-OWP:master Jul 11, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of the InitialDataAdder abstraction #383

Addition of the InitialDataAdder abstraction #383

robertbartel commented Jul 10, 2023

aaraney commented Jul 10, 2023

robertbartel commented Jul 10, 2023

aaraney commented Jul 10, 2023

aaraney left a comment

aaraney Jul 10, 2023

aaraney left a comment

Addition of the InitialDataAdder abstraction #383

Addition of the InitialDataAdder abstraction #383

Conversation

robertbartel commented Jul 10, 2023

Additions

Changes

aaraney commented Jul 10, 2023

robertbartel commented Jul 10, 2023

aaraney commented Jul 10, 2023

aaraney left a comment

Choose a reason for hiding this comment

aaraney Jul 10, 2023

Choose a reason for hiding this comment

aaraney left a comment

Choose a reason for hiding this comment